Notes on the Generalisation Performance and Fisher Consistency of Multicategory Classifiers
نویسنده
چکیده
Existing bounds on the generalisation performance of multicategory classifiers are reviewed and considered in the light of the framework of Hill and Doucet (2005). Insights obtained through the use of this framework are used to further refine these bounds. Similarly, insights into the Fisher consistency of multicategory classifiers which can be obtained from the framework are discussed. 1 Acronyms IID Independent and Identically Distributed SRM Structural Risk Minimisation SVC Support Vector Classification VC Vapnik Chervonenkis 2 Introduction An important aspect of many kernel based algorithms such as Support Vector Classification (SVC), is that Structural Risk Minimisation (SRM) ideas can be applied in order to obtain distribution-free bounds on performance. Such an approach underlies the initial work on SVC in particular, and results in ideas such as the Vapnik Chervonenkis (VC) dimension. In this paper we include a summary of the body of work which is concerned with bounding the performance of multicategory classifiers. This was originally published by Guermeur (2002), but it is important to realise that this paper draws heavily from the work of Elisseeff et al. (1999), to the extent that if one is to fully understand the ideas presented by Guermeur (2002), then it is probably going to be necessary to read it in conjunction with Elisseeff et al. (1999). Further insight is also to be found in work by Paugam-Moisy et al. (2000). New material is presented which is in the same vein, however this is slightly more straightforward and is closer in formulation to the original two-class approach. This material draws on the insights of Hill and Doucet (2005) to reduce the multidimensional bounding problem to a scalar problem, and thus to fully utilise the more traditional approaches to bounding. These approaches are also drawn on by Elisseeff et al. (1999), however by viewing the problem in the manner proposed here it becomes possible to adopt them virtually unchanged. The key references for this work are by Bartlett (1998) and Williamson et al. (2001). Fisher consistency is also examined in this work — it is another desirable classifier property, as has been discussed in the multicategory setting by Terwari and Bartlett (2007); Zhang (2004a,b).
منابع مشابه
New Multicategory Boosting Algorithms Based on Multicategory Fisher-consistent Losses.
Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. We characterize a wide class of smooth convex lo...
متن کاملMulticategory large-margin unified machines
Hard and soft classifiers are two important groups of techniques for classification problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classifiers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classification task or not. In particular, soft classifiers predic...
متن کاملMulticlass Distance Weighted Discrimination with Applications to Batch Adjustment
Combining different microarray data sets across platforms has the potential to generally increase statistical power. Distance Weighted Discrimination (DWD) is a powerful tool for solving binary classification problems in high dimensions. It has also been shown to provide an effective approach to binary cross-platform batch adjustment. In this paper, we extend the binary DWD to the multicategory...
متن کاملFisher Consistency of Multicategory Support Vector Machines
The Support Vector Machine (SVM) has become one of the most popular machine learning techniques in recent years. The success of the SVM is mostly due to its elegant margin concept and theory in binary classification. Generalization to the multicategory setting, however, is not trivial. There are a number of different multicategory extensions of the SVM in the literature. In this paper, we revie...
متن کاملOn Multicategory Truncated-Hinge-Loss Support Vector Machines
Abstract. With its elegant margin theory and accurate classification performance, the Support Vector Machine (SVM) has been widely applied in both machine learning and statistics. Despite its success and popularity, it still has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Moreover, the number of support vector...
متن کامل